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Abstract — Wireless sensor nodes continuously observe and 
sense statistical data from the physical environment. But what 
degree of accurate data sensed by the sensor nodes 
collaboratively is a big issue for wireless sensor networks. 
Hence in this paper, we describe accuracy models of sensor 
networks for collecting accurate data from the physical 
environment under two conditions. First condition: we propose 
accuracy model which requires a priori knowledge of statistical 
data of the physical environment called Estimated Data 
Accuracy (EDA) model. Simulation results shows that EDA 
model can sense more accurate data from the physical 
environment than the other information accuracy models in 
the network. Moreover using EDA model, there exist an 
optimal set of sensor nodes which are adequate to perform 
approximately the same data accuracy level achieve by the 
network. Finally we simulate EDA model under the thread of 
malicious attacks in the network due to extreme physical 
environment. Second condition: we propose another accuracy 
model using Steepest Decent method called Adaptive Data 
Accuracy (ADA) model which doesn't require any a priori 
information of input signal statistics. We also show that using 
ADA model, there exist an optimal set of sensor nodes which 
measures accurate data and are sufficient to perform the same 
data accuracy level achieve by the network. Further in ADA 
model, we can reduce the amount of data transmission for 
these optimal set of sensor nodes using a model called Spatio- 
Temporal Data Prediction (STDP) model. STDP model 
captures the spatial and temporal correlation of sensing data 
to reduce the communication overhead under data reduction 
strategies. Finally using STDP model, we illustrate a 
mechanism to trace the malicious nodes in the network under 
extreme physical environment. Computer simulations 
illustrate the performance of EDA, ADA and STDP models 
respectively. 

Index Terms — Wireless senor networks, data accuracy, spatial 
correlation, adaptive filter 

I. Introduction 

Recent progress in real time distributed system has made 
a drastic improvement for monitoring continuous data over 
wireless sensor networks. Such continuous monitoring of 
real data applications permits to observe in both time and 
space. In wireless sensor network, sensor nodes are deployed 
both in time and space to monitor the physical phenomenon 
of data (e.g temperature) from the physical environment [1]. 
For a specific time instant, sensor nodes collect the data in 
space domain and transmit it to the sink node. The major task 
of sensor nodes is to collect the data from the physical 
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environment. Since the data collected by the sensor nodes 
are generally spatially correlated [2,3] among them, the sen- 
sor nodes need not require transmitting all the sensor read- 
ings to the sink node. Instead a subset of sensor reading is 
sufficient to transmit to the sink node maintaining a desired 
accuracy [7-11]. Thus exploring spatio-temporal correlated 
data to transmit a subset of sensor reading maintaining de- 
sired estimated data accuracy at the sink node is an emerg- 
ing topic in wireless sensor network and is the key interest of 
this paper. This procedure can reduce a significant communi- 
cation overhead and energy consumption in the network. 

The first motivation of this paper is to develop accuracy 
models for the network to sense accurate data from the physi- 
cal environment. To collect accurate data for the network, we 
develop accuracy models under two conditions. Firstly, we 
propose accuracy model called Estimated Data Accuracy 
(EDA) model which requires a priori knowledge of statistical 
data of the physical environment. We compare the perfor- 
mance of EDA model [7] with other information accuracy 
model [4-6] which illustrate that EDA model performs better 
than other models to select an optimal set of sensor nodes in 
the network. EDA model requires exact variances and covari- 
ances of the statistical data with prior information of the physi- 
cal environment. But in practice, this type of prior informa- 
tion of signal statistics is difficult to get in real scenario and 
model it. Hence we propose another accuracy model called 
Adaptive Data Accuracy (ADA) model to overcome this dif- 
ficulty. To the best understanding of authors, this is the first 
time, we propose ADA model which doesn't require any a 
priori information of statistical data of the environment and 
measures accurate data for the network. ADA model esti- 
mates a desired accuracy collaboratively using adaptive 
Steepest -Decent method [15] at the sink node from an opti- 
mal set of the sensor nodes instead of using all sensor nodes 
in the network. The data collected using ADA model is dy- 
namic and doesn't require relying on historical information 
of data to estimate data accuracy at the sink node. 

The second motivation of this paper is to reduce the 
communication overhead of optimal sensor nodes selected 
in the network using ADA model while maintain a certain 
degree of data accuracy. These optimal sensor nodes se- 
lected in the network using ADA model transmits a subset of 
sensor readings to the sink node to explore data reduction 
strategies [25]. In data reduction strategies, we use adaptive 
LMS filter to reduce the amount of data transmitted by each 
sensor nodes under spatially correlated data in the sensing 
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region. Under data reduction strategies, we propose a model 
called Spatio- Temporal Data Prediction (STDP) model which 
not only reduces the communication overhead but also have 
the learning and tracking capability to trace the internal 
variations of the statistical signal in the network. STDP model 
uses adaptive LMS filter both at the sensor nodes and the 
sink node. In STDP model, filter at the sink node does the 
joint prediction [25] to capture the spatial and temporal data 
correlation among the optimal sensor nodes in the sensing 
region. 

Now it is crucial to explain the importance of STDP model. 
Why STDP model is better than other data prediction 
models? The answers of this question are illustrated as 
follows: 

(i) In STDP model, sink node estimates a global weighted 
vector using LMS filter which captures the spatial and 
temporal data correlation among the sensor nodes in the 
wireless sensor region. Global weighted vector calculated at 
the sink node gives the information about the statistics of 
data in the network. Thus global weighted vector calculated 
at the sink node in STDP model estimates good data at the 
sink node for the network. But in other prediction models 
[23-25], sink node doesn't capture spatial and temporal data 
correlation features in the sensor region since a single 
weighted vector is considered individually for the respective 
sensor node. 

(ii) If a node collects bad or malicious data, the estimated 
value of weighted vector calculated at the sink node degrades. 
In prediction model [25], weighted vector has to depend on a 
single node data collection. But STDP model is not restricted 
to predict the data from a single node like other prediction 
model. In STDP model, sink node has the knowledge of the 
statistics of whole data (spatio-temporal data correlation) of 
all sensor nodes using a global weighted vector. Thus STDP 
model does the joint prediction scheme at the sink node for 
data reduction which lags in literature [25]. 

(iii) In literature [25], filters at the sink node and the 
sensor nodes are always active. If the filter at the sensor 
node is always active, it consumes energy still it doesn't 
perform any transmission of data. But in STDP model, we 
have switching mode (like ON/OFF) mechanism to make the 
filters at sensor nodes and the sink node to be idle. Thus our 
STDP model can save more energy than the existing model. 

(iv) In literature [26], a spatio-temporal model is illustrated 
where historical sensed data is taken to estimate sensor read- 
ings in current period. But in STDP model such historical 
data is not taken to estimate readings in current time period. 
In STDP model, spatio-temporal data is refreshed in a cyclic 
order after certain interval of time using a new global weighted 
vector calculated at the sink node. 

The third and final motivation of this paper is to verify 
our propose models when the network is under the thread of 
malicious attacks. Since maximizing the network life time sub- 
jected to event constraint and information gathering to maxi- 
mize the network life time subjected to energy constraints 
[18], [19] are discussed without verifying the data accuracy. 
Verifying data accuracy is essential before data aggregation 



ACEEE Int. J. on Network Security , Vol. 4, No. 1 , July 201 3 

as it degrades the accuracy level if some of the sensor nodes 
gets malicious [7], [12] due to extreme physical environment 
like heavy rain fall etc. Thus inaccurate data aggregated with 
the other correct data results poor data aggregation and re- 
duces data accuracy level at the sink node. We perform EDA 
model under the tread of malicious attacks. We simulate and 
compare EDA model when the network is under thread of 
attacks as well as when the network is good (not under thread 
of attacks). EDA model estimates data accuracy under the 
thread of malicious nodes but don't incorporate to trace the 
number of malicious nodes in the network. Therefore, finally 
in this paper, we propose a mechanism to find the number of 
malicious nodes in the network if any using STDP model. 

The rest of the paper is given as follows. In section II, we 
explore the motivation and problem definitions of our work. 
In section III, we explain briefly the accuracy model which 
requires a priori knowledge of signal statistics of the envi- 
ronment. Further in section IV, we explain accuracy model 
which doesn't require a priori knowledge of signal statistics 
and the data reduction model. In section V, we perform the 
simulation as well as validation and finally conclude our work 
in Section VI. 

II. Overview Of Approach and Problem Definations 

The purpose and motivation of this paper is explained in 
threefold which are as follows: 

(i) In wireless sensor networks, sensor nodes sense sta- 
tistical data from the physical environment and transmit it to 
the sink node. But how much accurate data is collected by 
the sensor nodes collaboratively is a big issue with respect 
to quality of services. Hence in this paper, we develop data 
accuracy models which can extract accurate data from the 
physical environment. Again these data accuracy models are 
categorized under two situations. In one situation, accuracy 
model called Estimated Data Accuracy (EDA) model is pro- 
posed where a priori knowledge of statistical data is known. 
EDA model performs better than other information accuracy 
models [4-6] and can still be meeting by an optimal set of 
sensor nodes rather than taking all the sensor nodes in the 
network maintaining a desired data accuracy. In EDA model, 
variance or covariance of the sensed data is assumed to be 
known. In other words, we have a priori knowledge of input 
signal statistics of the environment in EDA model. In another 
situation, we propose Adaptive Data Accuracy (ADA) model 
which doesn't require any a priori knowledge of input signal 
statistics of the environment. ADA model has the capability 
for tracing continuous data stream for a regular time interval 
to estimate the required signal. ADA model also extract accu- 
rate data from the physical environment and satisfies the 
criteria to find an optimal set of sensor nodes maintaining the 
desired data accuracy. 

(ii) One of the major tasks of sensor nodes in wireless 
sensor network is to transmit a subset of sensor readings to 
the sink node estimating a desired data accuracy. To fulfill 
this task, at first an optimal set of sensor nodes are selected 
in the network using ADA model maintaining a certain 
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Figurel. System Architecture for Data Accuracy Models in Wireless Sensor Networks 



degree of data accuracy. Then these optimal set of sensor 
nodes transmits a subset of sensor readings to the sink node. 
This reduces the amount of data transmission and communi- 
cation overhead. This can be done using methodology called 
Spatio-Temporal Data Prediction (STDP) model. STDP model 
is developed to capture the signal statistics when the data 
are correlated among the sensor nodes in the wireless sensor 
network. This STDP model performs a joint prediction scheme 
which can learn and track the internal variation of the signal 
statistics to adopt itself with the environment. STDP model 
reduces the communication overhead in the wireless sensor 
network based on data reduction strategies. 

(iii) Finally, we verify our EDA and ADA models under 
the threads of malicious attacks. We evaluate the performance 
of EDA model by introducing some malicious nodes in the 
network and compare with good network. In EDA model, we 
don't have any methodology to find the number of malicious 
nodes in the network. But in ADA model, we are able to find 
a mechanism using STDP model which not only have the 
capability to trace the number of malicious nodes in the 
network but also have the learning as well as tracking 
capability. We summarize our motivation of work in this paper 
given in Fig. 1 . In the later sections, we discuss data accuracy 
models with and without a priori knowledge of signal statistics 
of data respectively. 

III. Data Accuracy Model with a Priori Knowlegge of 
Signal Statistics 

In this section, we develop a mathematical foundation of 
data accuracy model called Estimated Data Accuracy (EDA) 
model which requires a priori knowledge of statistical data 
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of the physical environment. Moreover we perform EDA model 
under the thread of malicious attacks. 

A. Estimated Data Accuracy Model 

Initially, we deploy randomly M sensor nodes in a sens- 
ing region. Assuming M sensor nodes collaboratively senses 
the physical phenomenon of desired signal d. For simplicity 
we call these M sensor nodes as Clients .We construct a 
mathematical model to estimate the observed data at the sink 
node. Sink node is like as server which is responsible for 
collecting the observation made by M sensor nodes to esti- 
mate J from d . The error signals [13, 14] can be defined as 

d (d <r/) (i) 
We find d by minimizing the mean square error from the 
expectation of signal J 2 as follows. 

The observation done by each sensor node / is given as 



Nodes (clients) 



+ v where 



{d 2 ,u 2 } 



M 



(3) 




{d M ,u M > 
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Figure 2: Architecture of System Model 
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Assuming uncoded transmission for the observed signal 
sensed by sensor nodes in the wireless sensor network. Each 

sensor node / sends a scaled version Y i [4] of the observed 

signal u i to the sink node with power constraint p in the 
network. Hence transmitted signal in the network is given as 



U- 



\ Y mJ 



yy 2 




Z 



7 

\ M J 



d + 



\ v mJ 



-u i where y/ i 



for i e M 



Hence Y t — if/ t U t for i e M 

The signal 5^ transmitted by each sensor node ; is sent to 
the sink node through additive white Gaussian noise (AWGN) 
channel [6], [13] in the network. Sink node store the received 
signal in JJ matrix as shown in Fig. 2 for all sensor nodes in 
the network as 

U = y/X (4) 



for zero mean random vector {d,U} for some matrix z ■ V is 
a zero mean random noise vector with known covariance 
matrix E(VV T ) ~<JyI . The covariance matrix of d is also 

known E(dd T ) « o]l and {t/,V} are uncorrelated. The lin- 
ear least mean square estimator works according to orthogo- 
nal function as E{y T d) = e(u t (d-WU)) = . Using (9) 
we write Was 



W = 



E(U T d) 



2 ^rT -1 

<T d Z Iff 



E(U T U) (v-Z t Z + ( jI) 



(10) 
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where ; 


= 1,2 


K U MJ 
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Hence we use linear least mean square estimator of d given 
X for M sensor nodes in the network is given as 



Sink node retrieve the signal to the estimate d of d ■ We 

assign ^ as a random variable of function JJ to recover the 

estimate d of observed signal made by M sensor nodes 
collaboratively at the sink node. 

d = hiUy (5) 
Thus the mean square error is represented as 

min E(3Y ® 

We take h(U) for the subclass of affine functions [15] of U 

aS d = h(U~) = iWU + 

where W is matrix and g is a scalar quantity. Hence affine 
estimator of d is taken unbiased with E(d) = and 

E(d) = WE(U) + g = 8 -For a linear estimator, we have g = 
to get 

cl = YVU (7) 
We find the optimal value of W at the sink for J to minimize 

min E(d - WUf (8) 

w 

We find the optimal value of W for the estimate of d using 
orthogonal [15] function. The vector JJ is orthogonal to the 

error signal (d) -To get the optimal value of W for the 

estimator at the sink node, we define a linear model [7] for (4) 

as 



d{M)-- 



1 



yx, = l fx t ,where 7 J M + 4 j (12) 



We define mean square error between d an d d(M) to find 

the estimated data accuracy [7] for m sensor nodes in 
network as 

m^-k^)f=¥\-MA^Mk^f] (i3) 

The normalized [6,10] data accuracy D A (M) for m sensor 
nodes in the network is given as 



D A (M)=1- 



D(M) 



1 



-[2E[dd(M)]-E[d(My]] (14) 



U = y/{Zd + V} 



(9) 
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E[d z ] E[d z ] 

The normalized data accuracy D A (M) for m sensor nodes 
in the sensor region can be implemented in spatial correlation 
model [7] in the network. 

Now we model a spatially correlated physical 
phenomenon of sensed data for m sensor nodes as a joint 

Gaussian random variable (JGRV's) [4,5] as : E[d t ] = and 
var[d i ] = a~ d .The covariance between d and d t is 
cov[dd, ] = Em] = o$K(dis v ) . 

Similarly the covariance between d t and d. is 
cov[d i dj]=E[d i dj] = cr d K(dis i j) . The covariance model 
[16] K(dis Uj ) where dis UJ : =11 d i -d. II represents the 
Euclidian distance between node / and j .The covariance 
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function is non-negative and decrease monotonically with 
the Euclidian distance dis tj =\\d i -d j II with limiting values 
of 1 at dis = and of at dis = oo . We take the power expo- 



nential model [17] i.e K(dis : j)=e 



-(<fe (u) /9) 



for > where 



is called as 'Range parameter'. 'Range parameter' controls 
the relation between the distance among sensor nodes 
(i, j) and the correlation coefficient p(i, j) . Using the corre- 

lation model, we get p d , d =e and p d , d , =e 

Using (3) and (12) in (14), we get the EDA model in the net- 
work as 



\( M \ 1 ( M M \ 



4 



(15) 



B. EDA Model under the tread of malicious attacks 

Data gathering or data aggregation are the traditional 
procedure subjected to energy constraints [19] and 
maximizing [18] network life time. These procedures are done 
without verifying the data accuracy before data aggregation 
in the network. Hence without verifying the data accuracy 
before data aggregation cause problem if some of the senor 
nodes get malicious [7] in the network. The sensor node gets 
malicious due to extreme physical environment e.g heavy 
rainfall or snow fall etc. Malicious nodes sense inaccurate 
data readings and transmit the inaccurate data to the sink 
node. Sink node aggregate inaccurate data with the other 
correct data send by the sensor nodes. Thus sink node 
estimate inaccurate data reading which results poor data 
gathering in the network. We perform EDA model under such 
situation when the network is under the thread of malicious 
attack. 

IV. Data Accuracy Model without a Priori Knowlegge of 
Signal Statistics 

In this section, at first we construct the mathematical 
foundation to select the optimal sensor nodes using a model 
called Adaptive Data Accuracy (ADA) model. This ADA 
model doesn't require a priori knowledge of input signal 
statistics to estimate the required signal. These optimal sensor 
nodes are selected from the network using ADA model to 
perform data transmission maintaining desired data accuracy. 
Data transmission can be further reduced using a methodology 
called Spatio-Temporal Data Prediction (STDP) model. STDP 
also captures the spatial and temporal correlation of data 
among the sensor nodes to reduce the communication 
overhead in the network using LMS (adaptive) filter. Finally 
using STDP model, we find a mechanism to trace the malicious 
nodes in the network. 

We consider m sensor nodes randomly distributed over 
a wireless sensor network. When a query is requested from 
the sink node to the sensor nodes, the sensor nodes start 
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sensing the physical phenomenon of data e.g temperature 
from the environment and transmit data to the sink node. For 
the simplicity of system model, we called these sensor nodes 
as clients and the sink node as the central server as shown in 
Fig. 2. Hence clients and server performs the following roles 
in wireless sensor networks for data transmission. 

Clients: Each sensor node j can sense and observe the 
physical phenomenon of data in the wireless sensor network. 
The observation made by each sensor node to collect the 
continuous block [20] of data samples up to samples over a 
window frame of time interval is given as 

K f ={u),u] ....vt*} (IxN) where /eM (16) 
The corresponding scalar measurement (desired signal) done 
by each sensor node is given as 

d t = u t w + v i where ieM (17) 
where w is (N x 1) an initial weighted vector with unknown 
matrix in the client side, v, is the temporal and spatial 
uncorrelated white noise. Each sensor node j transmits 
u i observation to the sink node through additive white 
Gaussian noise (AWGN) channel [6], [13] in the wireless 
sensor network. 

Server: Sink node restores the observed data received 
from all the sensor nodes in JJ matrix and the corresponding 
desired signal in d matrix in the network as follows 

U =col{u l ,u 2 u M }(MxN) (18) 

d=col{d l ,d 2 d u \ (Mxl) (19) 

A. Adaptive Data Accuracy Model 

We propose a model called Adaptive Data Accuracy 
(ADA) model which doesn't require a priori knowledge of 
signal statistics and can trace the continues data stream for a 
regular interval of time under spatially correlated data in the 
sensor region . 

Since m sensor nodes are randomly deployed, the 
observed data sensed by the sensor nodes are spatially 
correlated among them in the sensor network. Hence we can 
reduce the number of sensor nodes while maintaining 
approximately the same data accuracy level which we achieve 
by m sensor nodes, since the observed data are spatially 
correlated among them. We perform Minimum Mean Square 
Estimation (MMSE) [14] with adaptive [21]approach at the 
sink node to reduce the number of sensor nodes subjected 
to data accuracy for the spatially correlated observed data 
sensed by m sensor nodes in the network. 

Since our aim is to minimize the cost function [22] as 

J(w) = E\\d-Uw\\ 2 



(20) 

to get an optimal solution w using normal equation 

9l du = 9{ uu w . Here U is a (1 x M ) row observation vector, 

w is another weighted vector of (M x 1) matrix and J is a 
scalar desired signal calculated at sink node. 
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Expanding (20), we get the Minimum mean square 
estimation (MMSE) for the ADA model given as 

j ( w ) = a ] - m T du w - w T m „ , + w T m „ „ w ( 2 1 ) 

where 9l uu = E[U T U] ; 9? rf „ = E[U T d] ;a 2 d =E\d\ 2 
Using adaptive Steepest Decent method [15], [22], sink 
node find a global statistical information {9?„„,3? rf „ } for the 
spatially correlated data in the network as follows. 

'■EfWjMj] ... £[MjM M ]^ 

m m =E[U T U] = 



OOP 

i/| H| r u i 



E[u M u{\ ■■■ E[u M u M ] 
o o p 

I "1 »M ™1«M 



cr cr p 



(22) 



Similarly 



cr cr p 

"M U M ru M U M J(MxM) 





f E[du^ 








E[du 2 ] 




® ' u 2 P du 2 


M du = E[U T d} = 




= <*d 






y E[du M , 




o p , 

^ u u f du M j 



(23) 



/(Mxl) 

We model spatially correlated data as a Joint Gaussian 
Random Variable (JGRV)'s [4], [5] as follows : 

1 ^ . 1 N 

d = -J^ X d i, E \d I 2 = o\ and £ K ] = — Z "» where 

i=l,2,3...M nodes and j=l,2,3,...N samples in each node ; 

standard deviation of «, = o u . , standard derivation of d = o d 

for z = 1,2,... M nodes. The covariance between u t and w ; - is 
Covlu^j] = E[u ! u j ] = a u o Uj p UiUi where p„„ ( is the correla- 
tion coefficient between J and u i for j = 1,2, m ■ We 

define correlation model [4] K(.) as K(dis l j) = p where 
dis j j =11 u j -Uj His the Euclidian distance between the sen- 
sor nodes j and j .We adopt power exponential model [16, 



-(&,,,/») 



for 



= yAV r (25) 
where A is the diagonal positive entries of eigen values and 
V is a unitary matrix satisfying yy T = y T y = I . We pick the 

largest eigen value (/l max ) to find H according to [15] as given 
as 

0<^<(2U max ) (26) 
Putting optimal weighted vector calculated from (24) in cost 
function J(w) equation (21), we get the normalized data 
accuracy at the sink node for the network is given as 

J(w) 



Accuracy (M) = 1 - 



Oa 



(27) 



17] in correlation model as K PE (dis ij )=e 
> where Q is a 'Range Parameter' [4], [7] .Thus we get 

-(</«,-, 10) , -(dis dl IB) 

p UiU . = e and p dUi =e . 

We start from an initial guess for w and derive a proce- 
dure in recursive manner until it converges to w . Hence the 
weighted vector for ADA model is given as 

w k = w k _ x + M [m du - * „„ w t _, ] (Mxl) (24) 

Where w k is the weighted vector with new guess having 

k iterations. w t _j is a old guess for (k - 1) iterations and 

jj. > is apositive step size parameter [15], [22], [27] which is 
calculated under spatial correlation of data among sensor 
nodes in the network as follows 
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This normalized Accuracy(M) using ADA model 
calculated at the sink node is used for finding the optimal 
number of sensor nodes in the network subjected to data 
accuracy. Thus ADA model doesn't require a priori knowledge 
of input signal statistics to calculate the data accuracy of 
signal at the sink node and can trace the continuous data 
stream for a regular interval of time in the sensor network. 

B. Spatio-Temporal Data Prediction Model 

Since we select optimal sensor nodes using ADA model 
in the network maintaining a certain level of data accuracy, 
these optimal sensor can perform data transmission in the 
network. We develop a methodology called Spatio-Temporal 
Data Prediction (STDP) model by which we can further re- 
duce the data transmission for these optimal sensor nodes in 
the network. STDP captures the spatial and temporal correla- 
tion of data to reduce the communication overhead among 
the sensor nodes based on data reduction strategies. In 
data reduction strategies, sensor nodes only transmit a sub- 
set of data stream to the sink node instead of transmitting the 
whole data stream. This reduces communication overhead in 
the network. STDP model performs a joint prediction scheme 
to capture the spatial correlation among sensor nodes to 
reduce the communication overhead. Moreover our approach 
doesn't require any a priori knowledge of input signal statis- 
tics and have the learning as well as tracking capability to 
trace the internal variation of the signal statistics. 

When a query is requested from the sink node (server) to 
all the sensor nodes (clients), STDP model starts transmitting 
the data to the sink node for joint prediction of data in the 
network as follows: 

Phase I Client: Each sensor node \ in the client side 

transmits the spatially correlated data u t according to (16) 
and the corresponding measured data d t according to (17) 
(along with initial weighted vector w™ ) to the sink node, w™ 

is an unknown vector w „' =co/{l,l, ,1} I ' 4n [22 ]. 

Initially at this moment, the filter at each sensor node and the 
filter to be used at the sink node are kept ideal in the network. 

Phase II Server: Sink node store the received u t obser- 
vation transmitted from j sensor nodes in JJ matrix and 
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d i in d matrix according to (18) and (19) respectively. Using 
adaptive Steepest-decent method [15], [22] , sink node find 
another global statistical information {R^, R DU } for the spa- 
tially correlated data in the network as follows 

R uu =E[U T U](A^x^) and R DU =[U T d] (Nxl) (28) 
Hence using this method [22], the estimated global weighted 
vector calculated at the sink node is given as 

M 

< b =*C + »Tl R nu* -Ruu^-i) (Nxl) (29) 

According to instantaneous approximations [22], the global 
statistical information can be written as 

RjjUJ = U 'i U i ( NxN ) and R DUJ = U i^i ^ xl > 

for ieM (30) 
Hence using instantaneous approximations, the global 

weighted vector w? (29) can be modified as 



(Nxl) 



the error threshold (a) value, the global weighted vector 



M , 

Glob(lA) Glob , 'V/ T 7 I T \ Glob 

i=l 



for ieM (31) 
Similarly using LMS filter [22], the global weighted vector 

w k 3,ob calculated at the sink node is given as 



M 

;=1 



for i e M (32) 
Now w f lab{LMS) j s use d t calculate the prediction filters 



(for m nodes) at sink as y„ . = UW 



,Glob(LMS) 



This realizes that 



at this moment, the prediction filters at sink node are active 

to calculate y slnk and prediction filter (y ; ) used at each 

sensor node ; is still kept idle. Finally we use y sink to calculate 
the prediction error at sink node as 

Glob 



error =[d-y Sink ] 



(33) 



Glob(LMS) 



calculated at the sink node is transferred to the 
corresponding sensor node in the network. This 



transmission of 



Glob(LMS) 



is like a request from the sink 
node to sensor node for stopping the transmission of 



data [23] . Once w 



Glob(LMS) ■ 



is transferred from the sink node 



to sensor node, filter residing at sink node for the sensor 
node goes to ideal and it goes to prediction mode , finally 
filter residing for the corresponding sensor node is yet to 
become active. 



Phase HI Client: Once 



Glob(LMS) 



is received by each 



sensor node (conditioned error calculated at the sink node 
for the respective node is less than the threshold), it (client) 

uses vi/ G ' o6<iMS) to calculate its new weighted vector, filter 

and error. Since U i observation is sensed by sensor node / , 

the desired signal scalar value calculated by each sensor 
node is given as 

r new Glob(LMS) , . a\ 

a . = u .w k + v i where ; e M (34) 

Hence the new updated weighted vector calculated at each 
sensor node / in the network is given as 



<7 = w 



+ M (uj(dr-u t w iki ) ) ) (Nxl) 



for i e M (35) 
Now each sensor node finds its individual filter update value 

as y'.""' =u j w""'' and finally the scalar error value calculated 

for each sensor node i in the network is given as 



new V inew new] 

error x =\d { -y i J 



(36) 



We define a user defined error threshold (a) value to satisfy 
these two conditions defined as follows: 

• If the error a '" h (scalar value) calculated at the sink node 
for the respective node is greater than the error 
threshold (a) value, then the corresponding sensor node 
still continue to send the data to the sink node. This 
makes the filter at the sink node to adopt well for the 
received data transmitted from the corresponding node 
and goes to adaptive mode. In this situation, filter at the 
sink node (server) for this node is active and the filter for 
this corresponding node (client) is still kept ideal. 

• But if the error a '" h f° r the respective node is less than 
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Again using another user defined error threshold (/?) value, 
we illustrate two conditions: 

• If error"'"' calculate at the sensor node is greater than the 

error threshold value (/?) , observation U t sensed by it is 

still transmitted to the sink node. At this time filter at 
node is active and goes to adaptive mode. At this stage, 
filter at the sink for the corresponding node is set idle. 

• If error!'"' calculate at the sensor node is less than the 
error threshold value (ft) , the sensor node stop 

transmitting the observation U t to the sink node. Once 
data transmission is stopped, sensor node / transfer it's 

w"j H to the sink node. This transmission of w""' is like 

a response from the sensor node to the sink node that 
transmission of data is stopped. At this moment filter at 
the node is set ideal and goes to prediction mode. Sink 

node utilizes w,"™ transmitted from each sensor node to 
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track the signal statistics of each sensor node / in the 

network. Once error""' > ft , Repeat the same process 
as we explained in Phase I. 

Thus using these three phases of STDP model, we perform 
the data reduction strategies to reduce the communication 
overhead for the optimal sensor nodes in the network. STDP 
model can track and learn the internal variation of the signal 
statistics without requiring a priori knowledge of the 
environment .Moreover it does the joint prediction scheme 
for the optimal sensor nodes to capture spatial and temporal 
correlation of data among sensor nodes in the network. 

C. Tracing Malicious Nodes in the Network 

In EDA model, we perform data accuracy for the network 
when some of the sensor nodes get malicious. But does not 
incorporate to trace and discard the malicious nodes from 
the network to get better data accuracy. Hence using STDP 
model, we find a novel methodology to trace the malicious 
nodes in the network. Since m sensor nodes are randomly 
deployed over a sensor region, we assume some of the nodes 
get malicious due to extreme physical environment. But we 
don't know the extract number of malicious nodes out of 
M sensor nodes in the network. Our motivation is to trace 
these malicious nodes in the network. The node is malicious 
or not depends upon the statistical behavior of observa- 
tion u t . We repeat the same procedure explained in STDP 

model of Phases-I-III where weighted vector VV ( "™ gives 

the statistical information of each node to trace the malicious 
behavior. If the node is malicious, then the statistical value 

/ new \ 

\ w mai _i,k ' of that node is different from the normal [7] nodes 

.Finally we can trace the malicious nodes and discard it from 
the network to get better data accuracy and data aggregation 
under spatially correlated data. 

V. Performance Evaluations and Validations 

In this section, the simulation results are performed using 
matlab to validate the effectiveness of our proposed data 
accuracy models under a priori and without a priori knowledge 
of signal statistics of the physical environment respectively. 
To perform the simulations, a sensing region of 4m x 4m grid 
based wireless sensor topology is taken with a sink node in 
the network. We deploy ten sensor nodes in the sensing 
region. Each sensor node can sense the observations (e.g 
temperature) from the physical environment and transmit it 
to the sink node. 

A. Performance Evaluation of Data Accuracy Model with a 
Priori Knowledge of Signal Statistics. 

Here we simulate Estimated Data Accuracy (EDA) model 
which require a priori knowledge of information about the 
physical environment and compare with the other information 
accuracy models. Moreover we also simulate EDA model 
under the thread of malicious attacks. 



Performance of EDA model compare to other accuracy mod- 
els:: We perform the simulation for EDA model with ten sen- 
sor nodes as shown in Fig. 3. Result shows that EDA can 
sense more accurate data and performs better than other in- 
formation accuracy models [4-6 ]. Moreover as we keep on 
increasing the number of sensor nodes to ten, the data accu- 
racy remains approximately constant in the sensing field. It 
shows that six sensor nodes are sufficient for achieving the 
same estimated data accuracy level instead of deploying ten 
sensor nodes. Hence it is unnecessary to choose all the ten 
sensor nodes as six sensor nodes are sufficient to perform 
the communication process maintaining the desired data ac- 
curacy making rest of the sensor nodes to be in sleep mode. 
Thus an optimal set of sensor nodes are sufficient to main- 
tain the desired data accuracy level using EDA model in the 
network. 

0.96 1 1 1 1 1 1 1 1 1 1 



0.94- 




0.78 



1 23456789 10 
Number of Sensor Nodes 

Figure 3. Number of sensor nodes versus data accuracy 
Performance of EDA model under the thread of malicious 
attacks: We assume some of the senor nodes become 
malicious due to extreme physical environment like heavy 
rainfall etc. In such situation noise variances of malicious 
nodes are much higher than normal nodes. Normal nodes are 
good nodes and not under the thread of malicious attack. 
Thus in this simulation set up, we compare two deployment 
strategy for EDA model as shown in Fig 4. In the first strategy, 
initially we deploy five sensor nodes and keep adding sensor 
nodes to ten sensor nodes. These nodes are normal nodes. 
In another node deployment strategy, initially we deploy five 
sensor nodes in similar way but out of five sensor nodes, 
assuming two sensor nodes are malicious. We keep going on 
adding sensor nodes till we get ten sensor nodes in the 
network. We compare these two deployment strategies of 
EDA model and conclude that sink node estimates more 
accurate data when there are normal nodes in the network. 
But if the sensor nodes are under the thread of malicious 
nodes, the sink node estimates inaccurate data and performs 
poor data gathering for the network. 
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6.5 7 7.5 8 8.5 
Deployed sensornetwoik 

Figure 4. Comparison for data accuracy under normal nodes and 
under the thread of malicious nodes in EDA model 

B. Performance Evaluation of Data Accuracy Model without 
a Priori Knowledge of Signal Statistics 

We simulate for an accuracy model called adaptive Data 
Accuracy (EDA) model which doesn't require a priori knowl- 
edge of information about the physical environment. Using 
ADA model, we show that there exist an optimal set of sen- 
sor nodes which are sufficient for achieving the desired data 
accuracy level. Thus optimal sensor nodes are selected us- 
ing ADA model in the network to perform the communication 
process maintaining a certain degree of data accuracy. Fur- 
ther Spatio-temporal Data Prediction (STDP) model is used 
to reduce the data transmission for these optimal set of sen- 
sor nodes. Finally using STDP model we find a mechanism to 
trace the number of malicious nodes in the network. 

Since ADA and STDP models have the ability to trace the 
internal variations of the signal statistics to adopt itself with 
the physical environment, we consider such a tropical envi- 
ronment for our experiment where the variations of signal 
statistics (e.g temperature) are much more for certain dura- 
tion. ADA and STDP models can work well to trace the varia- 
tions of signal statistics for tropical desert area like Jaisalmer 
(Rajasthan-India). On 26 th January 2012, the minimum and 

maximum temperatures recorded in jaisalmer are 7° Celsius 

and 22° Celsius respectively according to [28]. Such varia- 
tions of temperature for a particular duration are the subject 
of interest to measure the variation of signal statistics of 
temperature rather than measuring the temperature variation 
of room temperature using ADA and STDP models. Since the 
temperature variation of room temperature is very less say 

for example 26° Celsius to 30" Celsius for a particular dura- 
tion. Hence for our simulation purposes, we generate ran- 
dom data (temperature) using matlab which is sensed by sen- 
sor nodes to validate our results for ADA model and STDP 
model respectively. Thus the sensor nodes reported random 
(temperature) data once every 15 minutes recorded over one 
day (26 th Jan 2012) onjaisalmer. Another example is to choose 
subtropical highland climate like Mawsynram 1 and 
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cherrapungi ,(Meghalaya- India) where our model can works 
well to trace the variations of the signal statistics (measure 
the rainfall in mm) in the sensing region. 
Performance of ADA model to select optimal sensor nodes: 
In this simulation, we estimate the data accuracy of the sig- 
nal statistics at the sink node for all the deployed ten sensor 
nodes in the sensing region using ADA model. In Fig 5, we 
perform data accuracy of the signal statistics at the sink node 
for the ADA model with respect to number of sensor nodes 
in the network. Simulation results shows that about six sen- 
sor nodes are sufficient to perform approximately the same 
data accuracy as achieve by the ten sensor nodes in the 
sensing region. Thus an optimal (six) sensor nodes can per- 
form data accuracy using ADA model instead of using ten 
sensor nodes in the network. We choose about six sensor 
nodes which are almost close to the sink node are eligible to 
perform the data transmission in the network maintaining a 
desired accuracy level using ADA model. Thus reducing the 
number of sensor nodes or selecting optimal sensor nodes 
for data transmission maintaining a desired accuracy in the 
network can reduce the communication overhead and in- 
crease the lifetime of the network. 
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Figure 5. ADA model: data accuracy vs number of nodes 
Performance of STDP model to reduce communication 
overhead: From the previous simulation setup, we select 
optimal (six) sensor nodes instead of taking ten sensor nodes 
for performing the communication process in the network 
using ADA model maintaining a desired data accuracy. 
Further we can reduce the data transmission using STDP 
model for these optimal set of sensor nodes. Hence in this 
simulation setup, we illustrate the performance analysis of 
STDP model. 

Since using ADA model, we select about six sensor nodes 
(optimal sensor nodes selected) to perform the data trans- 
mission instead of using ten sensor nodes, we assume these 
optimal sensor nodes are with node Id's 2,4,5,7,9 and 10 se- 
lected for data transmission of signal statistics. These sen- 
sor nodes are chosen such that they are close to the sink 
node in the sensing region. Further using STDP, we can re- 
duce the communication overheadof the data transmission 
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for these six sensor nodes to the sink node in the network. To 
analysis our simulation results in Fig. 6 using STDP model, 
we report data transmission percentage of sensor readings 
for optimal number of nodes selected with respect to a vary- 
ing error threshold value (/3) . For each sensor node, a pre- 
diction error is calculated for each data stream of ( N block) 
sensor readings to transmit. If the prediction error is greater 
than the error threshold value, the respective data stream of 
(block) sensor readings are transmitted by each sensor node 
to the sink node. Instead of transmitting all the data streams, 
a subset of data stream sensor readings for each sensor node 
is delivered to the sink node using data reduction strategies. 
Moreover Fig. 6 also shows statistical variations of signal 
for sensor nodes in the network. The statistical data streams 
among sensor nodes are almost similar because the data 
streams are spatially and temporally correlated among them. 
Thus subsets of data streams are transmitted by these opti- 
mal sensor nodes using STDP model to further reduce the 
data transmission in the network. 

In Fig 7, we compare the percentage of data transmission 
for data stream block size N = 4 and data stream block size 
N = 5 for sensor node Id-2 with respect to error threshold 
value (/3) . The simulation result shows that if we transmit 
data stream block size of, we can reduce the percentage of 
transmission cost effectively than transmitting data stream 
block size of. Another conclusion is drawn as the data stream 
block size is small, tracking and learning of statistical signal 
is easier whereas if we use data stream block size larger, a 
better estimation of statistical is performed. 




Node 2 
Node 4 
Node 5 
Node 7 
Node 9 
Node 10 



0.4 0.6 0.8 1 12 1.4 

Error threshold value (beta) 

Figure 6. Percentage of data transmission versus error threshold 
value {ft) . 

Tracing malicious nodes using STDP model: Finally using 
STDP model, we can find the number of malicious nodes in 
the network. Malicious node can sense inaccurate data read- 
ings .The signal variations of malicious nodes are much higher 
than normal nodes. In our network, assuming node Id's 5 and 
9 are malicious and node Id's 2, 4, 7 and 10 are normal nodes. 
Fig. 8 shows that variations of the weighted vector of node 
Id's 5 and 9 are much higher than the normal nodes. The 



weighted vector (w ma \ i k ) of malicious nodes shows ab- 
normal signal variations than the normal nodes. Such abnor- 
mal signal variation of the weighted vector of sensor node 
detected at the sink node is said to be malicious node. 
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Figure 7. Percentage of data transmission versus error threshold 
value (/3) for Block size f\[ = 4 and f\[ = 5 data streams 

Thus we can easily trace the number of malicious nodes in 
the network by analyzing the signal variation of the weighted 

vector ( vvfj 1 * ) of each node. The signal variation of weighted 

vector and the variance of each sensor node are summarizes 
in Table I. Thus from Table I , we conclude that the weighted 
vector of node Id's 5 and 9 shows abnormal statistical varia- 
tions than the normal nodes. Finally node Id's 5 and 9 can be 
discarded from the network to get better data accuracy in the 
network when we don't have a priori knowledge of signal 
statistics of physical environment. 
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Figure 8. Weighted vector versus data sample block size (N=5) of 
each sensor node in STDP model to trace malicious nodes. 

Table I: Weighted vector of sensor nodes to trace malicious nodes in the 
network. ( Nor: Normal nodes and Mal: Malicious nodes) 
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VI. Conclusions 

In this paper, we presented two data accuracy models to 
sense accurate data from the physical environment. First, 
Estimated Data Accuracy (EDA) model which require a priori 
information of signal statistics of environment. EDA model 
senses more accurate data and performs better than other 
information accuracy models. Second Adaptive Data 
Accuracy (ADA) model to select an optimal sensor nodes in 
the network under adaptive approach. ADA model doesn't 
require any a priori knowledge of the signal statistics of the 
environment. Moreover we describe Spatio- Temporal Data 
Prediction (STDP) model which reduces the communication 
overhead for these optimal sensor nodes under data reduction 
strategies. Simulation results show that STDP can learn and 
track the internal variation of signal statistics of the 
environment. Finally we propose a mechanism using STDP 
model to trace the malicious nodes in the network if any, due 
to extreme physical environment e.g heavy rainfall. Extensive 
simulation results are performed to validate EDA, ADA and 
STDP models respectively under malicious network. 
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